Preserving Privacy in Spoken Language Databases
نویسندگان
چکیده
Goal-oriented spoken dialog systems aim to identify intents of humans, expressed in natural language, and take actions accordingly, to satisfy their requests. State-of-the-art data-driven spoken dialog systems are trained using large amounts of task data which is usually transcribed and then labeled by humans, a very expensive and laborious process. Hence sharing and reuse of this data has extreme importance for research and development of spoken language processing systems. On the other hand these utterances may include confidential personal information about the speakers, such as the social security numbers or credit card numbers. In this paper, we describe data sanitization approaches for natural language utterances to protect the privacy of the speakers. The challenge in sanitization is ensuring that the performance of the spoken dialog system models trained using the sanitized data is as good as the ones before the sanitization. We show that, by hiding task-dependent named entities we can preserve the privacy of the speakers, and still achieve a comparable accuracy.
منابع مشابه
A centralized privacy-preserving framework for online social networks
There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...
متن کاملLimiting Disclosure in Hippocratic Databases
Preserving data privacy is of utmost concern in many sectors, including e-commerce, healthcare, government, and retail, where individuals entrust others with their personal information every day. Often, the organizations collecting the data will specify how the data is to be used in a privacy policy, which can be expressed either electronically or in natural language. We describe a data model f...
متن کاملSQL-Based Fuzzy Query Mechanism Over Encrypted Database
With the development of cloud computing and big data, data privacy protection has become an urgent problem to solve. Data encryption is the most effective way to protect privacy; however, it will change the data format and result in: 1. database structure and application software will be changed; 2. structured query language (SQL) operations cannot work properly, especially in SQL-based fuzzy q...
متن کاملEvaluation of advanced techniques for multi-party privacy-preserving record link- age on real-world health databases
The linking of multiple (three or more) health databases is challenging because of the increasing sizes of databases, the number of parties among which they are to be linked, and privacy concerns related to the use of personal data such as names, addresses, or dates of birth. This entails a need to develop advanced scalable techniques for linking multiple databases while preserving the privacy ...
متن کاملPrivacy Preserving Data Mining
Through data mining collect large amount of data in many organizations. A key value of huge databases today is technical or financial research. In a huge collection of data there arises a key issue that is privacy. Due to personal interests, medical databases or business interests privacy is needed. Due to privacy infringement while performing the data mining operations this is often not possib...
متن کامل